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Abstract 

Background: Osteopontin (Eta, secreted sialoprotein 1, opn) is secreted from different cell types including cancer 
cells. Three splice variant forms namely osteopontin-a, osteopontin-b and osteopontin-c have been identified. The 
main astonishing feature is that osteopontin-c is found to be elevated in almost all types of cancer cells. This was 
the vital point to consider it for sequence analysis and structure predictions which provide ample chances for 
prognostic, therapeutic and preventive cancer research. 

Methods: Osteopontin-c gene sequence was determined from Breast Cancer sample and was translated to protein 
sequence. It was then analyzed using various software and web tools for binding pockets, docking and druggability 
analysis. Due to the lack of homological templates, tertiary structure was predicted using ab-initio method 
server - l-TASSER and was evaluated after refinement using web tools. Refined structure was compared with known 
bone sialoprotein electron microscopic structure and docked with CD44 for binding analysis and binding pockets 
were identified for drug designing. 

Results: Signal sequence of about sixteen amino acid residues was identified using signal sequence prediction 
servers. Due to the absence of known structures of similar proteins, three dimensional structure of osteopontin-c 
was predicted using l-TASSER server. The predicted structure was refined with the help of SUMMA server and was 
validated using SAVES server. Molecular dynamic analysis was carried out using GROMACS software. The final model 
was built and was used for docking with CD44. Druggable pockets were identified using pocket energies. 

Conclusions: The tertiary structure of osteopontin-c was predicted successfully using the ab-initio method and the 
predictions showed that osteopontin-c is of fibrous nature comparable to firbronectin. Docking studies showed the 
significant similarities of QSAET motif in the interaction of CD44 and osteopontins between the normal and splice 
variant forms of osteopontins and binding pockets analyses revealed several pockets which paved the way to the 
identification of a druggable pocket. 



Background 

Cancer results from alterations that disrupt the appropri- 
ate controls and balances that direct normal cellular 
growth and development. These changes resulting in 
altered gene products or altered gene expression can 
occur in two classes of genes that interact with each 
other: genes that inhibit tumor suppressor genes and 
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genes that facilitate cell growth and development [1]. 
Malignant tumors are characterized by dysregulated 
growth control, the overcoming of replicative senescence 
and the formation of metastases. Several growth factors 
and cytokines play pivotal roles in the regulation of pro- 
liferation, survival, adhesion and migration of neoplastic 
cells [2]. Decades of scrutiny into the molecular basis of 
cancer have largely focused on what causes oncogenic 
transformation and the incipient emergence of tumors 
[3]. The invasion of tumor cells is a complex, multistage 
process. To facilitate the cell motility, invading cells need 
to change the cell-cell adhesion properties, rearrange the 
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extracellular matrix environment, suppress anoikis and 
recognize their cytoskeletons [4] . 

A biomarker is any substance, which when detected in 
biological samples or tissue, is associated with an in- 
creased risk of a disease. Serum biomarkers are produced 
by body organs or tumors and when detected in high 
amounts in the blood, can be suggestive of tumor activity. 
These markers are nonspecific for cancer and can be pro- 
duced by normal organs as well. Most biomarkers are 
used infrequently for screening purposes. They are more 
often used to evaluate treatment effects or to assess the 
potential for metastatic disease in patients with established 
disease. Osteopontin (OPN) was identified as one such 
biomarker [5]. Osteopontin is a secreted glycoprotein that 
plays important roles in a wide range of biological pro- 
cesses, including tissue remodeling, inflammation, angio- 
genesis, tumor development and immunity to infectious 
disease [6]. Osteopontin also increases expression of 
HIF-la through phosphatidyl inositol 3 ' -kinase/ Acutely 
transforming retrovirus AKT8 in rodent T cell lymphoma 
(PI3-K/Akt) pathway [7]. 

The OPN is a 32.5-kDa multifunctional protein with 
multiple phosphorylation and glycosylation sites and con- 
tains an arginine-glycine-aspartic acid-binding (RGD) do- 
main as well as two heparin-binding sites, one thrombin 
cleavage site (RSK [arginine- serine -lysine]) and a calcium- 
binding site. The protein functions as both a cell attach- 
ment protein and a cytokine that has a signaling function 
through the action of two cell adhesion molecules: av(33- 
integrin and CD44 [8]. It is also a tumor-associated pro- 
tein, which mediates tumor transformation and malignant 
progression. OPN has been proposed to promote tumor 
progression through several mechanisms, including in- 
creased cell survival, migration, invasion, neovasculariza- 
tion, and modulation of immune function. The RGD 
domain of OPN functionally mediates cell adhesion, mi- 
gration and invasion through integrin engagement. Inter- 
action between the RGD domain of OPN and integrin 
receptors leads to Nuclear Factor-KappaB (NF-kB) and 
Focal adhesion kinase (FAK) actvation mainly through 
decreased apoptosis. These data indicate that the pre- 
dominant mechanism, by which OPN promotes tumor 
growth and metastasis through the RGD domain, is en- 
hancement of survival in the tumor microenvironment 
[9]. When OPN is cleaved at the RSK site by thrombin, it 
is separated into two approximately equivalent sized 
pieces, including N-terminal and C-terminal fragments. 
Thrombin is activated by tissue factor (TF) which is over- 
expressed on the surface of cancer cells. Both N-terminal 
and C-terminal fragments increases adhesion and migra- 
tion of cancer cells through interaction with integrins and 
cyclophilin C respectively [10]. Enhanced OPN expression 
has been detected at the tumor site as well as in plasma 
and serum of patients with various types of cancers [11]. 



The existence in humans, of two osteopontin splice vari- 
ants with deletions of exon 4 referred to as osteopontin-c 
or exon 5 called osteopontin-b and the normal osteopon- 
tin referred as osteopontin-a has been described by Young 
et al [12]. Alternative splicing occurs in a region in a 
molecule that is upstream of the central integrin binding 
domain and the C-terminal CD44 binding domain. Inter- 
estingly, osteopontin-b expressed by transfection is unstable 
and the protein is degraded in the proteosome. In addition, 
osteopontin-b RNA is present at consistently low levels of 
expression in breast tissue specimens [13]. Osteopontin-a 
was found to be expressed in both normal and cancer cells 
to a lesser extent whereas osteopontin-c transcripts were 
never detected in the normal tissue samples but were 
present only in tumor cells [14]. The splice variant 
osteopontin-c, which does not contain the sequence 
encoded in exon-4, lacks an important domain for 
calcium induced aggregation and transglutamination. 
Lack of this domain forms the soluble form of the pro- 
tein [15]. Among the three splice variants of osteopon- 
tin expressed in breast cancer, the shortest form, 
osteopontin-c, supports anchorage-independence more 
effectively than the full length form, osteopontin-a. 
Splice variant form, osteopontin-c, is brought about 
through the gain of function by the cancer cells, 
reflected in the activation of unique signal transduction 
pathways. Osteopontin-c coordinately induces oxidore- 
ductase genes that are associated with the mitochon- 
drial energy metabolism and with the hexose mono 
phosphate shunt [14]. 

Taken together, this growing list of studies suggests 
that osteopontin blood levels have a potential as a prog- 
nostic or diagnostic marker in prostate, breast, head and 
neck and other cancers. It should be noted, however, 
that osteopontin is unlikely to be a blood marker that is 
specific to cancer because osteopontin levels are also ele- 
vated in other conditions including sepsis, kidney disease 
and cardiovascular disease. But, the identification of the 
splice variant form of osteopontin-c solved this problem 
[14]. In order to study further about the role and function 
of osteopontin-c, the three dimensional structure 
might be useful, which is yet to be determined through 
x-ray crystallographic or NMR techniques. In this con- 
text, in silico structure prediction of osteopontin-c 
was carried out along with sequence analysis and dock- 
ing studies. 

Methods 

Web based tools 

The web based tools were used for the purpose of trans- 
lation, similarity studies, tertiary structure prediction, 
model refinement, model evaluation, binding pockets 
prediction and docking. The list of websites along with 
web site addresses is shown in Table 1. 
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Table 1 Web based tool list 

S. No Server Website address 



1. 


ExPASy Translate 
tool 


http://web.expasy.org/translate/ 


2. 


SignalP 


http://www.cbs.dtu .d k/services/Sig na I P/ 


3. 


PrediSi 


http://www.predisi.de/index.html 


4. 


ClustalW 


http://www.genome.jp/tools/clustalw/ 


5. 


l-TASSER 


http://zhanglab.ccmb.med.umich.edu/l-TASSER/ 


6. 


SAVES Server 


http://nihserver.mbi.ucla.edu/SAVES/ 


7. 


SUMMA 


http://silvio.cs.uno.edu/proteinrefinementserver/ 


8. 


PocketFinder 


http://www.modelling.leeds.ac.uk/pocketfinder/ 


9. 


Q-Site Finder 


http://www.modelling.leeds.ac.uk/qsitefinder/ 


10. 


ClusPro 


http://nrc.bu.edu/cluster/ 


11. 


DoGSiteScorer 


http://dogsite.zbh.uni-hamburg.de/ 



Sequence source 

Osteopontin-c gene sequence was determined from 
breast cancer sample and it was translated to protein se- 
quence using ExPASy Translate tool (http://web.expasy. 
org/ translate/) [16]. 

Sequence analysis 

Signal sequence of osteopontin-c was predicted using sig- 
nalP [17] and PrediSi servers [18,19]. Tertiary structure 
prediction was carried out using I-TASSER tool [20]. 
Critical Assessment of Techniques for Protein Structure 
Prediction (CASP) is a community- wide experiment for 
testing the state-of-the-art of protein structure predictions 
which takes place every two years since 1994. The I- 
TASSER server (as "Zhang-Server") participated in the 
Server Section of 7th (2006), 8th (2008), and 9th CASPs 
(2010), and was ranked as the No 1 server in CASP7, 
CASP8 and CASP9. Thus, this server selected for tertiary 
structure prediction. The c-score is a confidence score for 
estimating the quality of predicted models by I-TASSER. 
It is calculated based on the significance of threading tem- 
plate alignments and the convergence parameters of the 
structure assembly simulations. The c-score is typically in 
the range of (-5, 2), where a c-score of higher value signi- 
fies a model with a high confidence and vice-versa [20]. 
The quality of the predicted structure was examined using 
an online metaserver SAVES, which uses Procheck [21], 
WhatCheck [22], Verify3D [23], ERR AT [24] and PROVE 
[25] servers. The predicted structure was refined using 
SUMMA server [26] . Molecular dynamic analysis was car- 
ried out using GROMACS (GROningen MAchine for 
Chemical Simulations) software [27,28]. Structure 
visualization was carried out using Accelrys' Discovery 
Studio Visualizer 1.7. Tertiary structure of osteopontin-a 
also predicted by I-TASSER server and subjected to other 
treatments as mentioned for osteopontin-c. 



Determination of conserved regions (domains) 

In order to determine conserved regions (domains) in 
osteopontin-c of human, it was aligned with rabbit, cattle, 
chicken, house mouse, Norway rat and water buffalo osteo- 
pontin sequences using clustalW [29] at http://www.gen- 
ome.jp/tools/clustalw/. RSK and RGD domain comparison 
was achieved by using Discovery Studio Visualizer (Accelrys 
Discovery Studio Visualizer, version 1.7, 2007; Accelrys 
Software Inc., San Diego). Tertiary structures of thrombin 
cleaved fragments were also predicted by I-TASSER server. 
The C-terminal fragment of osteopontin-c was used for 
hypothetical polymer formation using ICM Molsoft tool. 
Six subunits were utilized for the formation of polymer 
formation using import option of the ICM Molsoft tool. 

Docking 

The predicted tertiary structures of osteopontin-a and 
osteopontin-c were docked with CD44 using Cluspro. 
ClusPro is the first fully integrated server that includes 
both docking and discrimination steps for predicting the 
structure of protein-protein complexes. The server can 
be used to discriminate a set of potential complex struc- 
tures from several docking algorithms, or it can generate 
its own structures using DOT or ZDOCK [30]. 

Binding pockets predictions 

PocketFinder and Q-Site finder were utilized for binding 
pocket predictions [31]. PocketFinder is based on the 
Ligsite algorithm written by Hendlich et al [32] which 
was used to predict small molecule binding sites in pro- 
teins. Q-Site finder uses the interaction energy between 
the protein and a simple van der Waals probe to locate 
energetically favourable binding sites. 

Druggable pocket predictions 

DoGSiteScorer is an automated pocket detection and ana- 
lysis tool which can be used for protein druggability as- 
sessment. Based on the three dimensional coordinates of a 
protein, its potential active sites on the protein surface are 
calculated with DoGSiteScorer. DoGSiteScorer is a grid- 
based function prediction method which uses a difference 
of Gaussian filter to detect potential pockets on the pro- 
tein surface and splits them into subpockets. Subse- 
quently, global properties, describing the size, shape and 
chemical features of the predicted pockets are calculated. 
Per default, a simple score is provided for each pocket, 
based on a linear combination on the three descriptors 
describing volume, hydrophobicity and enclosure. For the 
discrimination of the druggability, a subset of meaningful 
descriptors is used in a support vector maschine (libsvm). 
The druggability model was trained and tested on the 
druggable cavity directory dataset consisting of 1069 
structures and yielded prediction accuracies of 88%. For 
each queried input structure, a druggability score 
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Model-5 



Figure 1 Predicted structure of osteopontin-c by l-TASSER. Figure shows five predicted models from l-TASSER Server. Of these five models, 
model 2 is reported as best model by comparing with the electron microscopic structure of Bone sialoprotein. 



between zero and one is returned. The higher the score 
the more druggable the pocket is estimated to be [33-35]. 

Results and discussion 

Sequence analysis 

Osteopontin-c gene sequence was determined from breast 
cancer sample and deposited to Genbank with ID 
JF412667. With the ExPASy Translate tool, a peptide 
sequence was deduced, consisting of 287 amino acid resi- 
dues. This sequence was 100% identical to the protein 
sequence in GenPept database (NP 001035149.1). Both 
the signal prediction tools namely SignalP and PrediSi in- 
dicated the presence of a potential signal peptide in 
osteopontin-c protein. Signal sequence prediction servers 
predicted an N-terminal cleavage site between 16th and 
17th amino acid residues of osteopontin-c sequence. After 
predicting the signal sequence, first 16 amino acid 



residues were identified as signal peptide and were re- 
moved from osteopontin-c sequence. The remaining pro- 
tein sequence was utilized for tertiary structure prediction 
because during protein folding under in-vivo condition, 
the signal sequence is removed. 

X-ray crystallography, NMR and cryo-electron micro- 
scopic studies were used in wet-lab for three dimensional 

Table 2 l-TASSER scores for predicted models of 
osteopontin-c 



S. No 


Model 


C-Scores 


No. of decoys 


Cluster density 


1. 


Model 1 


-3.99 


703 


0.0237 


2. 


Model 2 


-3.56 


544 


0.0364 


3. 


Model 3 


-3.99 


209 


0.0237 


4. 


Model 4 


-4.61 


199 


0.0127 


5. 


Model 5 


-4.43 


179 


0.0151 
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Figure 2 Superposition comparison for predicted and refined 
structure of osteopontin-c. Here all conformations are 
superimposed with a reference structure (Predicted model) using 
RMSD fit. Accelry's Discovery Studio Visualizer 1.7 was utilized for the 
superimposition. A. predicted structure. B. Refined structure. 



structure predictions of proteins [36]. NMR studies car- 
ried out to predict three dimensional structure of only 
RGD tripeptide sequence of osteopontin [37]. Prediction 
of protein structure from amino-acid sequences has been 
one of the most challenging problems in computational 
structural biology for many years. Historically, protein 
structure prediction was classified into three categories: 
(i) comparative modeling, (ii) threading, and (iii) ab-initio 
folding. The first two approaches build protein models by 
aligning query sequences onto solved template structures. 
When close templates are identified, high-resolution 
models could be built by the template-based methods. If 
templates are absent from the Protein Data Bank (PDB) 



library, the models need to be built from scratch, i.e. ab- 
initio folding. This is the most difficult category of protein 
structure prediction. Such difficult task was attained using 
I-TASSER tool [20]. Tertiary structure of osteopontin-c 
was predicted using I-TASSER Server [38]. Predicted 
tertiary structure is shown in Figure 1. I-TASSER result 
provided five models with different c-scores, cluster dens- 
ity and number of decoys as shown in Table 2. 

Predicted model-2 contains the highest c-score and 
also showed high similarity with electron microscopic 
structure of bone sialoprotein (BSP), which belongs to 
same Small integrin-binding ligand N-linked glycopro- 
teins (SIBLINGs) protein family. Both the Predicted 
model-2 and bone sialoprotein were found to have 
thread with globular domain structure. Electron crystal- 
lography is a form of microscopy that uses a beam of 
electrons to construct images of small solids such as 
proteins. This process is used to determine and predict 
the structure and arrangement of a protein from second- 
ary structure crystals such as alpha helices or beta sheets 
based on electron scattering. By electron crystallography 
method BSP structure determined. The BSP is a mono- 
mer possessing a globular structure with a diameter of 
10 ± 1 nm that is linked to a thread-like structure of 
25 ± 6 nm length. The globule is likely to correspond 
to the C-terminal part and the threadlike structure to 
N-terminal part of the protein [39]. 

Small integrin-binding ligand N-linked glycoproteins 
(SIBLINGs), a family of five integrin binding glycopho- 
sphoproteins comprising osteopontin (OPN), bone sialo- 
protein (BSP), dentin matrix protein 1 (DMP1), dentin 
sialophosphoprotein (DSPP) and matrix extracellular 
phosphoglycoprotein (MEPE), are an emerging group of 
molecular tools that cancer cells use to facilitate their 
expansion. SIBLINGs are soluble, secreted proteins that 
can act as modulators of cell adhesion as well as auto- 
crine and paracrine factors by their interaction with cell 
surface receptors such as integrins. BSP and OPN are 
two members of the SIBLING family of genetically 
related proteins that are clustered on human chromosome 
4. These two proteins have several common binding 
partners like CD44, integrins, matrix metalloproteinases 



Table 3 Significant SAVES validation server results of osteopontin-c 


S. No 


Servers 


Parameters 


Before refinement 


After refinement 


1. 


PROCHECK* 


G-Factor Value 


- 0.23 


- 0.33 


2. 


WhatCheck* 


Bond Lengths 


1.257 


0.782 


3. 


Verify3D* 


3D - 1D score > 0.2 


39.93% 


43.86% 


4. 


ERRAT* 


Overall Qualaity 


36.55% 


74.812% 


5. 


PROVE* 


% Outliers 


12.30% 


10.2% 



"*" - Lesser value indicates better quality. 
"#" - Higher value indicates better quality. 
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Time (ns) 

Figure 3 The Root mean square deviation plot (Carbon Alpha 
back bone) obtained from GROMACS tool during molecular 
dynamics simulation for 10 nanoseconds. Root mean square 
deviation (RMSD) of osteopontin-c model during the molecular 
dynamics (MD) simulation. The stabilization of the structure occurred 
in approximately 10 ns. Generated by GROMACS. 



(MMP), and complement factor H (CFH). Because of that, 
they had common interaction domains like RGD and in 
turn structure [40,41]. 

Predicted tertiary structure of osteopontin-c had 
three domains, namely N-terminal domain, central do- 
main and C-terminal domain. RGD and RSK motifs 
and two helical regions and three turns were present in 
central domain. N-terminal end domain consists of 
four antiparallel sheets, two helical regions and five turns. 
C-terminal end domain consists of one sheet, one helical 
region and one turn. Earlier hypothetical structure for 
osteopontin was predicted by Ganss. It was an open 
extended and flexible structure. Model-2 of I-TASSER 
result supported the proposal of Ganss [42] . 

Model-2 was refined using SUMMA Server. The pre- 
dicted structure was refined by fixing side chains, fixing 
problematic loops, removal of amino acid clashes 
(bumps) and energy minimization. Potential functions 
used in structure prediction and refinements are typic- 
ally grouped into two general classes: traditional "phys- 
ical" molecular mechanics potentials and statistically 
derived 'knowledge-based" potentials [26]. The refine- 
ments did not yield any drastic change in the initial pre- 
dicted structure augmenting the correctness of the 
predicted structure, which was confirmed by superim- 
position studies. 



Refined structure and predicted structure were super- 
imposed using Accelrys Discovery Studio Visualizer 1.7 
software. RMSD value for the superimposition was 
found to be 0.92 A 0 . The superimposed structure is 
shown in Figure 2. It was found that the structure of 
osteopontin-c was similar in structure to fibronectin 
which was determined by Amit sharma et al., [43]. Fi- 
bronectin (FN) is a large glycoprotein found on cell sur- 
faces, in the connective tissue matrix and in extracellular 
fluids. It participates in cell adhesion, spreading, migra- 
tion, extracellular matrix formation, hemostasis and 
thrombosis. FN binds to fibrin, collagens, gelatin, DNA, 
integrins, heparin and proteoglycans. Due to the com- 
mon interacting partners both have similar structures 
[43]. 

SAVES validation server results showed that refined 
structure was slightly improved from the predicted 
structures. Comparisons of validation server results are 
listed in the Table 3. Lesser values in ProCheck, 
WhatCheck and PROVE servers result of refined struc- 
ture showed that the structure quality was improved. 
Verify 3D values were found to be insignificant. Higher 
value in ERRAT result also supported the structure re- 
finement process. Molecular dynamics (MD) is a com- 
puter simulation of physical movements of atoms and 
molecules [44]. Molecular dynamic simulation with ex- 
plicit waters for 10 nanoseconds was assessed by GRO- 
MACS software [45]. The root-mean-square deviation 
(RMSD) variations for the molecular dynamic simulation 
are given in the Figure 3. It clearly showed that 
osteopontin-c had average RMSD deviation of 2.79 A 0 . 
Anyway, the refined structure was found to be an im- 
proved one. But one cannot reject predicted structure of 
osteopontin-c because the structure does not showed 
drastic variation in tertiary structure during molecular 
dynamic simulations. RMSD values below 5-6 A 0 are 
generally considered being characteristic for a stable 
protein in molecular dynamics simulations [46]. 

Determination of conserved regions (domains) 

The amino acid sequences of osteopontin were derived 
from human [47], mouse [48], rat [49], rabbit [50], Water 
buffalo [51] and cattle [52]. Multiple sequence alignment 
was carried out using ClustalW program at Eurobean 
Bioinformatic Institute. Based on sequence alignments, the 
amino acid sequence was divided into nine parts which 
were represented in Figure 4. Of these nine parts, only five 



AAl 26-30 36-39 50 70 95- 105 130 140 230 280 2S7 

M ■~GSsiiK VSSEE QETLPS PolyD ESHHSDESDE GRGDS GLRS IDSQ RI$HEL~M r 
I 1 3 4 5 6 7 8 9 I 1 

Figure 4 Selected structural domains and their corresponding locations in the human osteopontin-c. 
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parts have known functions i.e. PolyD - Binds with calcium, 
RGD - Integrin binding site, GLRS - Thrombin cleavage 
site, Eighth part - Calcium Binding site, and Ninth part- 
Heparin Binding site. Of these conserved regions, Poly D 
(4th), GRGDS (6th) and GLRS (7th) regions are well known 
for their functions [53]. Human and Rabbit sequences were 
found to have 64% similarity whereas human and chicken 



were found to have only 21% similarity score in multiple se- 
quence alignment. Multiple sequence alignment is shown in 
Figure 5. Phylogenetic relationships between these organisms 
are shown in Figure 6. Distinct differences were found to be 
present between human and chicken which could reflect 
functional and developmental differences between chicken 
and mammalian osteopontins [54]. 



CLUSTAL 2.0*12 multiple sequence alignment 



Cattle 

Buffalo 

House 

Rat 

Human 

Rabbit 

Chicken 



HR I AVI C FC L L G I AS AL FVKPTS 
HRIAVICFCLLGIASALFVKPTS 
HP L AVI C FC L FGI AS S L PVKVTD 
HRLAWCFCLFGLASCLPVKVAE 

MP I AVI C FC L L G ITC AI P VKQ AD 

HP, I AVI C FC LLGHAYAL P VKHAD 

HKLAFLC LCFI SIAAAVPVSKSRQHAI 



SGS SEEKQ LHNKY PD A V A I WLK PD P S Q KQTFLT 56 

SGS SEEKQ LNJJKYPD A VATVLK PD P S QKQTFL A 56 

SGS SEEK- L YS LHPD P I ATtf L V PD P S QKQNL L A 55 

FGS 5EEKAHY3KHSD A VATBLK PD P 5 Q KQN L L A 56 

SGS SEEK-- — 30 

SGS SEEKQ LYHKH PDALATULN PDPSQ KQHL LT 56 

S AS SEEK- ■ YDPPSHHTHFYHQ 47 



Cattle PQMSVSSEETDPNKQWTL PSKSNESPEQTDDLDDDDD-NSQDVNSNDSDDAETTDDPD- - 113 

Buf f a 1 o PGNSVSSEETPDNKQNTL PSKSNESPEQTDDLDDDDDDNSQDVNSND5DDADTTDDPD- - 114 

House P Q NAVS SEE KDP FKQETL P SNSNE SHDHMDDDD DDDDDD GDHAE SED 5VD SDE SD 110 

Rat P QNSVS S EETDP FKQETL P SNSNE SHDHHDDDDDDDDD - GDHAE SEP 5 VNSDE SO 110 

Human - ON AVS SEETND FKQETL P SKSNE SHDHHDDHDDEDDD — DHVD S QD 5 ID SHP 5DD VDDT 87 

Rabbit P QNAHS SEEKDD L KQETL P SKS IE SHDHVDD I DEDEDD - -DHVMQD SNE SDD ADHPDD - 113 

Ch i ck en DHVD SOS QEHL 0 QTQ JJD LAS L 0 QTHYS SEENADY PEQPDFPDIP5KSQE AVDDDDDDD - - 1 0 5 



Cattle 

Buffalo 

House 

Rat 

Human 

Rabbit 

Chicken 



Cattle 

Buffalo 

House 

Rat 

Human 

Rabbit 

Chicken 



Cattle 

Buffalo 

House 

Rat 

Human 

Rabbit 

Chicken 



HSPESHHSDESPEV D FPTPI PTI AVFTPFI PTE SAND GRGDSVAYGLK-SRS 164 

HSDE SHHSDE SDE A DFPTDI PTI AV FTP FI PTE STUD GRGD SVAYG LK- SRS 165 

o D E SDE T - -VTA S TOADTFT P I V PTVDV P NGP GD S LAY GLP-SES 156 

— E SHHSDE SDE S — FTAS TQADVLTPI A PTVDV PD GRGDS LAY GLR-SKS 156 

DDSHQSDESHHSDESDEL--VTDFPTPLPATEVFTPWPTO)rSfDGRGDSVVYGLR-SKS 144 
— SHHSDE SHO SDE SDE SDE VTVY PTED AATTV FTE W PTVE I YD GR GDS VAYR LKR SKS 171 
— HD SNDTDE SDE V VTD FPTE AP VTP FN — RGDNAGRGDS VAY G FR- AKA 150 

KKFRRSHV Q S PD ATEED FTSHI E SEEHHD A P -KKTSQLT 202 

KKFRRSHYQ S PD ATEED FTSH I E SEEHHD A P — KKTSQLT 203 

RSFQVSDEQYPDATDEDLTSHHKS GESKES LDVI PVAOLL SHP SDQDHHGKGSHES SOLD 216 
RS FP VSDE Q YPD ATDED LTS RHKS QE SDEA IKVI FVAQR L S VP SPQD SHGKTSHE S S Q LP 216 
KKFRRPDI Q YPDATDEDITSHHESEELNGAYKAI FVAQD L HAP SDUDSRGKDS YETS OLD 204 
KHFHVSNA QYP G A SEED L S SHVD SED LDPTPRAI PVAQH LNVPSDWDS QEKD SHDVS 0 VP 2 3 1 

HWKASKLR— -KAARKL IEDD ATAE VGPS Q — — — LAGLWLF 185 

: . . . : :.«:»::: : 

DHSKETHS SE L SKE LTPKAKDK — ■ N-KHSNLIE5QENSKLS— 240 

DHSKETNSRE L SKE LTPKAKDK NNKHSHLIESQEHSKLS 242 

EPS LETHR LEHSKES QE SAD Q SD VI D S Q AS SKAS LEH 253 

E P S VETHS LE Q SKE YKQRASHE STE Q SPA ID S AEK PD AIDS AER SD AI P S OAS SKAS LEH 276 

D Q S AETHSHKQ SRLYKRKANDE SHEHSD VID S QE L SKVSRE F 2 46 

DHS VETO S HE QAR Q Y* T'E ANDN 3VEHSHSIDSQES 3KVS OE S 273 

KE SPE QD SP£ LAQHQ S VEND SR PRFPS PE VP G GD - SKASAGV 226 



Cattle — QEFHSLEDKLD LDHKS -EEDKH L K I R I S HE LD S AS SE VN 278 

Buffalo — QEFHSLEDKLD LDHKS -EEDKHLKIRISHELDSASSEVN 280 

House Q SHK FHSHKDKL VLD PKSKEDDRY LKFRI SHE LE S 5 S SE VN 2 94 

Rat 0 SHE FHSHEDKL VLD PKSKEDPRY LKFRI SHE LE S S S SE VN 317 

Human HSHE FHSHEDHLWD P K 5 FEED L K TP I S HE LD S AS SE VN 287 

Rabbit QSRE FRSHEDKLAIE PKSEEDEEHEG LRVSHELD STSSEIN 314 

Chicken D SPE S L AS AS AVD ASNQTLE S AED AEDRH S IENNE VTR 264 



Figure 5 Multiple sequence alignment of osteopontin. Multiple sequence alignment of osteopontin amino acid residues of Bos tourus (cattle), 
Bubalus bubalis (water buffalo), Homo sapiens (human), Oryctolagus cuniculus (rabbit), Mus musculus (mouse), Rattus norvegicus (Norway rat) and 
Gallus gallus (chicken). The *(star) in the sequence represents identical residues. 
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i — Cattle 



1 — Buffalo 



i — Human 



1 — Rabbit 



i — Mouse 



*—Rat 



1 Chicken 

Figure 6 Phylogenetic tree. Phylogenetic tree obtained using a 
multiple alignment of osteopontin protein sequences from cattle 
(NP_776612), water buffalo (ABD73011), human (NP_00 1035 149.1), 
rabbit (NP_001 075663), mouse (NP_033289), rat (NP_037013) and 
chicken (NP_989866). See text for details. 



Predicted osteopontin-a and osteopontin-c contain both 
RGD and RSK motifs which can be seen in Figure 7. Dif- 
ference in the role of osteopontin-c from osteopontin-a in 
cancer can be clearly seen from Figure 7 that is the expos- 
ure of RSK domain. Due to the exposure of RSK domain, 
thrombin can easily cleave osteopontin-c into two frag- 
ments N-terminal and C-terminal fragments. The frag- 
ments were then subjected to tertiary structure prediction. 
Predicted structures of N-terminal and C-terminal frag- 
ments are shown in Figure 8. Predicted structure showed 
totally different structure when compared with the intact 
osteopontin-a and osteopontin-c. This might be the essen- 
tial reason for the role of osteopontin-c in cancer progres- 
sion and metastasis. C-terminal fragments' polymer might 
form channel like structures which is shown in Figure 8. 
Presence of RGD and RSK motifs in osteopontin were re- 
ported by several studies [55,56]. Thrombin cleaves be- 
tween R and S residues of RSK sequence [55,56]. This 
cleavage occurs within six amino acid residues of the 
GRGDS sequence, raising the interesting possibility that 
thrombin-cleavage further activates osteopontin by allow- 
ing greater accessibility of the GRGDS domain to cell sur- 
face receptors [57-60]. Xuan et al [61] have shown that 
thrombin cleavage of osteopontin abolishes its cell binding 
function. The N-terminal GRGDS -containing osteopontin 
thrombin-cleavage fragment is highly active in promoting 
tumor cell migration [62]. The N-terminal fragment 
contains two integrin binding sites. These integrin 
binding domains include a SVVYGLR domain and a 
well-characterized RGD domain [63,64]. Furthermore, 
a recent study by Mi et al. [65] demonstrated that the 
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Figure 8 N-terminal fragment (A) and C-terminal fragment (B) of osteopontin-c after the action of thrombin. Polyimerized structure 
(C) of C-terminal fragment of osteopontin-c. RGD domains are represented in ball and stick model in N-terminal fragment. Alternate presence of 
helices might be seen in C-terminal fragment which might provide the clue for the formation of pore or fibre. Tube and surface model are 
represented in the Figure C. Polymerization was achieved using ICM Molsoft Tool. 

V . J 



thrombin-cleaved COOH-terminal fragment of osteo- 
pontin can activate downstream signaling and influence 
breast cancer cell migration and invasion in-vitro. The 
COOH-terminal fragment of osteopontin binds with 
another marker of metastatic function (cyclophilin C or 
rotamase) to the CD 147 cell surface glycoprotein (also 
known as extracellular matrix metalloproteinase in- 
ducer or EMMPRIN), to activate Aktl/2 and matrix 
metalloproteinase-2 [65,66]. 

Previous studies in bovine osteopontin clearly dem- 
onstrated that gln-x-gln sequence is required for trans- 
glutaminase activity of cross linking osteopontin 
through gamma-glutamyl-epsilon-lysino peptidyl bond 
formation [67]. Multiple sequence alignment in 
Figure 9 clearly showed that the transglutaminase act- 
ing site is not present in osteopontin-c. The probable 
site is present between 46th and 52nd residues 
(PDPSQKQ) in osteopontin-a. The absence of this 
region in osteopontin-c clearly showed that transgluta- 
minase will not act on it. This might be the essential 
reason for the pathological role of osteopontin-c and 



it may not be crosslinked with extracellular matrix 
and thus it results in cell migration. This fact was 
also supported by the lack of exon-4 expression in 
osteopontin-c by Weber [15]. 

Docking 

Interestingly, OPN binding to CD44 results in the 
propagation of cytosolic signals that enhance integrin ac- 
tivation and thus migration in cancer cells. Binding of 
OPN with CD44 actively promote local proteolysis 
through binding with MMP3 and also activate comple- 
ment factor H (CF-H) which protects cancer cell from 
complement mediated lysis. Both integrins and CD44 
have well-established roles in tumor progression. There- 
fore, interfering with these receptor-ligand interactions 
by controlling receptor cell surface expression, blocking 
receptor-ligand binding or suppressing associated signal 
transduction are promising ways to block both tumor 
development and metastatic dissemination [40]. Thus, 
CD44 is selected as a potential receptor for docking 
studies. Docking analysis was carried out using CD44 as 



Sivakumar and Niranjali Devaraj Journal of Diabetes & Metabolic Disorders 2014, 13:13 
http://www.jdmdonline.eom/content/1 3/1/1 3 



Page 10 of 14 



osteopontin - a MSA 



Transglutaminase site 
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HPJAVICFCLLGIASALPVKPTSSGSSEEKOLNWKYPDAVAIUIWPDPSQKO TFLTPQNS 60 

KRIAVICFCLLGIASAL P VKPTS S GS SEEKQ LNNKY PDAVATU LI^PD PS QKQ TFL AP QN S 60 

HR I AV I C FC L L GHAY A L P VKHAD S GS SEEKQ L YHKH PDA L ATU L M PD P S Q KQ L LTP QN A 60 

HP IAVICFCLLGITCAI PVKOADSGSSEEKQLYWKYPDAVATULH PDPSQKQ L LAPQNA 60 

HRLAVIC FCLFGIASSL PVKVTDSCS SEE!-:- LYS LHPD P I ATWLV PDF SQKQ 'JLLAPQNA 59 

HRL AWC FC L FG L AS C L P VKVAE FGS SEEKAHYSKHSDAVATU L K PD P 5 Q KQ 1LLAPQNS 60 



/AnJLfr jPDPSQKQ 



Cattle 

Buffalo 

Rabbit 

osn-a 

House 

Rat 



VS SEETDDNK Q NTL P SKSME £ PE QTDD LD -DDDD -MS QD VW- - - SND S DD AETTDD PDHS IIS 

VS SEETDDHK QNTL P SKSNE S PE QTDD LD -DDDDDNS QD VN SHD SDD ADTIBD PDHS 116 

US SEEKDD LK QETL P SKS IE SHDHVDD IDEDEDDDHVMQD SNE SDD ADH FDD SHHS 117 

VS SEETND FK QETL P SKSNE SHDHMDDHDDEPDDDHVDS QD S IDSND SDD VDDTDD SHQ S 120 

VS SEEKDD FKQETL P SNSNE SHDHHDDDD -DDDDDD GDHAE SEDSVDSDE— S 109 

VS SEETDD FKQETL P SMSNE SHDHHDDDD -DDDDD - GDHAE SEDSVNSDE— S 109 



osteopontin - c MSA 



House 

Rat 

osn-c 

Cattle 

Buffalo 

Rabbit 



Transglutaminase site 



KRLAVICFCLFGIASSLPVKVTDSGSSEEK- LYS LHPD PIAT¥1VPDPSQKQWL LAP QtJA 59 
KRLAWCFCLFGLASCLPVKVAEFGSSEEKAHYSKHSDAVATWLK PDPSQKQfJLLAPQNS 60 

KRIAVICFCLLGITCAIPVKQADSGSSEEK 

HF I AV I C FC L L G I AS A L P VK FTS S GS SEEK Q L NHKY PD AVA I W LK PD P S QK QfT F LTPQHS 
I IF I A V I C FC L L G I AS A L P VK PTS S GS SEEK Q LMHKY PD AVATU LKPDP5QKQTFLAPQNS 



HF I A V I C FC L L GHAY A L P VKHAD S GS SEEK Q L YHKH PD A L ATM LN P D P S QK Q l-I L LTP Q N A 60 



* *a ■ **** . ■ 



-QMA 33 
60 
-50 



House 

Rat 

osn-c 

Cattle 

Buffalo 

Rabbit 



VS SEEKDD FK QETL P 5 MSHE S HDHMDDDD -DDDDDD GDHAE SED S VI> SDE SD 110 

VS SEETDD FK QETL P S N5NE S HDHHDDDD - DDDDD - GDHAE SED S VN SDE SD 110 

VS SEETND FK QETL PSKSNESHDHUDDHD-DEDDD- -DHVDSQD5IDSNDSDDVDDTDD5 90 

■ • S '- E E TDDNi : Q ITT L P S KSNE S PE OTDD LD - D DDD -H S QD VN S IJD SDDAETTD D --P 112 

VS SEETDDMQ NTL P S KSNE S PE OTDD LD-DDDDDNS QDVWSNDSDDADTTPD P 113 

MS SEEKDD LKQETLPS ¥3 I E S HDHVDD IDEDEDDDHVDNQDSNESDD ADH F DD S 114 



Figure 9 Multiple sequence alignment comparison between osteopontin-a and osteopontin-c with other species with respect to 
transglutaminase acting stie. Alignment analysis clearly showed the absence of transglutaminase site in osteopontin-c. Transglutaminase-2 
acting site is indicated by an open box. 



receptor and osteopontin types osteopontin-a and 
osteopontin-c as ligands. Docking results were analyzed 
manually and interaction sites were analyzed using Dis- 
covery Studio Visualizer 1.7 neighbour analysis tool The 
results of ClusPro docking studies between CD44 and 
osteopontin-a and osteopontin-c showed for the first 
time that both of them interact in a similar site. Inter- 
action of osteopontin to CD44 was proved by many 
studies [39,68] but none of the previous studies deter- 
mined the interaction sites. Docked structures of CD44 
and osteopontins and the interaction domains of CD44 
and osteopontins were shown in Figure 10. It was identi- 
fied that aspargine (233rd residue), serine (234th resi- 
due) and threonine (237th residue) residues were 
essential for interactions in osteopontin isoform se- 
quences. These residues might form the QSAET motif 
essential for the interaction of CD44 and osteopontins 
which is reported first time in the present study. Of 
these three residues, serine and threonine were found to 
be highly conserved in various species as was evident 
from the multiple sequence alingment. As the outcome, 



it was found out that serine and threonine residues are 
essential for interaction with CD44. This result supports 
the result of serine (234th residue) residue being glyco- 
sylated as a post translational modification in human 
osteopontin [69]. 

Binding pockets prediction 

Binding pockets prediction is an essential step towards 
drug designing and docking studies [70]. Predicted binding 
pockets are shown in Figure 11. Pocket Finder detected ten 
pockets in both osteopontin-a and osteopontin-c. Eighth 
pocket of osteopontin-a was found to be present in the 
close region of RSK motif whereas no pocket contains 
RGD motif. First pocket of osteopontin-c was found to 
contain both RSK and RGD motifs. This result again con- 
firms the role of osteopontin-c in cancer biology with 
respect to RSK and RGD motifs. Q-site finder predicted 
ten binding pockets from both osteopontin-a and 
osteopontin-c. Osteopontin-a has two pockets, namely, 
sixth and seventh, which contain RSK and RGD motifs in 
its outer layer, respecively. Osteopontin-c was found to 
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osn - c 



CD44 - OPN C 



Figure 10 Docked structure of CD44 with osteopontin-c and osteopontin-a. A. Docked structure of CD44 (red) and osteopontin-a (green) 
and osteopontin-c (purple). B. Interaction site of CD44 (red) and osteopontin-a (blue). C. Interacation site of CD44 (red) and osteopontin-c (blue). 
It was achieved using Accelry's Discovery Studio Visualizer 1.7 Tool. 




have second and seventh pockets with RGD and RSK mo- 
tifs, respectively. From all these results, the main novelty 
identified was that osteopontin-c contains binding pockets 
with highest rank with RGD and RSK motif, whereas 
osteopontin-a does not. Osteopontin promotes tumor 
growth and metastasis through the RGD domain [9]. 

Druggable pocket identification 

Pockets are ranked according to interaction energy, and 
it is assumed that these relate to locations where a puta- 
tive ligand could bind and optimise its van der Waals 
interaction energy. The total energy of the pocket de- 
fines its ability to bind a small molecule and therefore 
its druggability [71]. Druggable pockets differ from bind- 
ing pockets because druggable pocket was predicted 
using the druggable cavity directory dataset of drugs. A 
druggable pocket in osteopontin-c was predicted by 
DoGSiteScorer [34] and manual inspection of the pocket 
with their energies with the help of Swiss PDB Viewer 
3.7 tool. It was found that the Q-site finder predicted 
eighth pocket with APSD (170th to 173rd residues) and 



TSQLD (184th to 188th residues) motifs can be used as 
a drug target in osteopontin-c due to the presence of 
pocket with maximum energy and druggable score. Amino 
acid residues of this predicted pocket was found to be con- 
served which was proved by multiple sequence analysis 
[Figure 5]. Elevated expression of osteopontin-c has been 
found in many cancers and the level of its expression is 
associated with the metastatic potential of cancer. Thus 
targeting osteopontin using the druggable pocket would be 
a logical approach for cancer management [72]. 

Conclusion 

The key finding of the present study is the discovery, for 
the first time, of the binding site of CD44 in osteopontin-c 
which has aspargine (233rd residue), serine (234th 
residue) and threonine (237th residue) residues. During 
the course of the study, a novel druggable pocket with 
APSD and TSQLD motifs was also found, which will be 
useful for future computer aided drug designing studies. 
Another important finding is that the RSK sequence is 
exposed to thrombin in osteopontin-c splice variant only 
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Osteopontin-a 




Pocket Finder Q-site Finder 




Osteopontin-c 



Figure 11 Predicted binding pockets in osteopontin-c and osteopontin-a. PocketFinder and Q-site Finder predicted pockets in osteopontin-a 
and osteopontin-c are shown as groups. 



as evidenced by the predicted tertiary structure, which ex- 
plains the fact that only osteopontin-c is involved in me- 
tastasis. Due to the action of thrombin, osteopontin-c is 
fragmented into N-terminal and C-terminal fragments 
easily. Hypothetical proposal for the formation of channel 
like conformation by osteopontin-c was achieved with 
help of C-terminal fragment and favors cancer cell migra- 
tion and metastasis. Absence of "PDPSQKQ" sequence in 
osteopontin-c avoids full length protein polymerization by 
transglutaminase-2 and favors metastasis. On the other 
hand, full length osteopontin-a is polymerized by 
transglutaminase-2 and thus, osteopontin-a cannot help 
metastasis because polymer favors cell adhesion. Obvi- 
ously, experimental elucidation might be useful for further 



validation of real time tertiary structure of osteopontin-c. 
Until then, the present predicted structure might be 
used for computational drug design for osteopontin-c 
with respect to prevention of cancer. 
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